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A  its  I  It  AC  I 

Vector  quantiza*  mi  is  being  widely  used  in 
image  data  compressio'  applications  due  to  1  lie  Taels 
dial  il  is  capalilc  of  a.  uicviug  Tiaelional  liil  rales  with 
icasonahlf  complexity  .  ml  dial  die  decoding  is  a  very 
s  i  m  I  ■  Ic  ladle  look-up  heme  In  image  encoding,  a 
vector  quantizer  acecpi*.  a  block  of  pixels  and  outputs 
.in  address  of  die  best  matching  tile  stored  in  a 
codebook.  The  matching  algorithm  requires  a  large 
number  of  basic  arithmetic  operations  in  typical 
applications.  Since  real-time  coding  is  required  in 
many  video  applications,  the  need  Tor  dcdicaterl 
pioccssing  arch  declines  arisi  s  naturally.  This  paper 
investigates  die  mapping  ol  VQ  algorithms  onto  an 
.may  processor  10  achieve  near  real-time  compression 
ol  video  images. 


I.  INTRODUCTION 

linage  data  compression  algorithms  arc  used  to 
reduce  die  number  ol  bytes  required  to  irpicseiit  a 
digitally  encoded  image.  The  primary  applications  of 
image  compression  arc  to  minimise  communication 
bandwidth  for  image  transmission  and  to  minimize  the 
amount  of  memory  required  Tor  image  storage.  Typical 
television  images  have  ahoul  512x512  pixels  per  frame 
with  a  frame  rate  of  5(1  Iramcs/s.  When  digitally 
encoded  at  H*bits  per  pixel  intensity  resolution,  the 
required  digital  transmission  talc  is  nearly  fdl  million 
bits/s  1 1 1.  High  quality  color  graphics  displays  have 
1024x1024  24-bil  pixels  per  frame  requiring  3  Mbytes  of 
storage  space.  As  frame  sizes  increase,  tlu*  need  for 
cllicicnl  image  compression  algorithms  becomes  even 
more  acute. 

Image  compression  algorithms  arc  characterized 
by  compression  rale,  distort  ion,  and  computational 
complexity.  'I  he  compression  ratio  is  dclcrmiucd  by 
dividing  Ihc  number  of  bytes  required  to  represent  an 
image  by  the  number  of  bytes  needed  to  represent  the 
compressed  image.  Distortion  is  a  measure  of  tire  error 
introduced  into  an  image  by  Ihc  encoding  ami  decoding 
processes.  Computational  complexity  Is  an  indication  of 
the  number  of  nrillnuciic  calculations  necessary  to 
compu-ss  mot  decompress  an  linage.  Computational 
complexity  is  a  very  Important  consideration  in 
algorithm  implcnicntniloo,  especially  for  Image 
transmission  applications  whole  the  lime  available  lor 
encoding  Is  limited  hy  lire  I  mine  into. 

Musi  cumptcxxinn  algorithms  can  be  classified 
either  ax  scalar  qu  itl/cis  o<  vector  quantizers,  Scalar 
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qua  Izalioit  algorithms  achieve  compression  by 
diet  ing  cadi  input  data  value  into  a  single  codeword 
wliil  vector  quantizers  compress  data  by  mapping  a 
scquicc  or  group  of  scalar  values  into  a  single 
codeword  |2|  A  result  of  Shannon's  rate-distortion 
llieoiy  is  iliat  heller  perform,  uc  is  always  achievable 
“in  theory”  by  ending  vectors  instead  of  scalars.  ‘I  bis 
makes  vector  quantization  algorithms  attractive  for 
applications  requiring  high  compression  rates. 

A  disadvantage  of  vector  quantizers  is  that  they 
are  often  more  complex  than  scalar  quantizers.  For 
example,  the  ircc-scnrcli  vector  quantization  algorithm 
implemented  in  this  work  would  require  ahoul  400 
million  integer  arithmetic  operations  per  second  to 
encode  the  television  image  described  above. 
Currently,  single*  DSP  or  microprocessor  chips  (such  as 
Ihc  AT&T  DSIM2C  or  lulcl  iXfdl)  cannot  maintain  this 
computational  rtoe  |3.  4|.  Array  processors  provide  a 
feasible  solution  lo  this  problem  and  can  he  scaled  10 
line)  1  lie  computational  requirements  oT  different 
application*. 

The  Video  Analysis  Transputer  Array  (VA’I'A)  is  a 
lie  ilile  recoil ligitrahlc  array  processor  that  has  been 
dc  ued  and  built  at  Naval  Ocean  Systems  Center  (NOSC) 
in  tt  Diego  |5,  fi|.  The  VATA  is  an  array  of  Inmos  THUD 
Iran  outers  connected  with  software  rcconfigurable 
conn  tnicaiion  links.  It  lias  been  designed  for  use  as  a 
vari.v  Ic  architecture  array  processor  testbed  for 
matching  optimal  array  configurations  to  a  wide 
variety  oT  image  and  signal  processing  algorithms.  The 
army  icsidcs  in  an  HIM  I’C-AT  host  and  lias  a  high 
speed  interface  lo  a  (tame  grabber  for  image 
processing  applications. 

Two  vector  quantization  fVQ)  algorithms  have 
been  mapped  onto  the  VATA.  Optimal  array 
architectures  have  been  found  for  each  quantize*  the 
effects  on  performance  of  code  optimizations,  memory 
limitations  in  pioccssing  nodes,  amt  oveilappiug  of 
communication  and  compulation  have  also  been 
investigated. 


2.  VICTOR  QUANTIZATION  AI.UORItTIMS 

The  full  and  hittniy  tice-tcatth  YQ  tlgotUluw.e 
ate  fousldcied  sn  this  wmk  lor  imph-mvutu  ah*  tut  * 
rcconligurahle  nuay  |2|,  The  main  advance  ol  the 
full  search  algorithm  is  a  moderate  mcmoiy 
tequltemenf.  Uutoitrtnatcly,  full  search  is  extremely 
computationally  intensive  The  binary  »ree>x«ateh 
algorithm  needs  twice  its  notch  me  »oty  hnt  cao  encode 
an  Image  with  Ur  fewer  atiihmeik  ojtoimkm* 
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lull  Seal  ill  VQ  Algorithm 


3. 


I  lie  full  oaicli  algorithm  encodes  an  input 
vcclor  X  into  an  lex  (or  “codeword")  i  using  a  fixed  set 
ol  vectors  Yn,  ku.nvn  as  a  "codcbook".  Tlic  input  vector 
is  compared  to  cadi  of  the  vectors  in  the  codcbook.  A 
distortion  measure  !>,„  is  calculated  for  each  Yn, 
codcliook  vciloi  and  the  luiniiniiiiti  disloition  l)j  i 
lound  correspond. ng  to  codcbook  cnlry  Y,.  The  index  i 
ol  llic  codcbook  vector  dial  jiioduicil  I  lie  miiti  iiiiimi 
distortion  is  the  output  of  die  encoder  The  decoder 
operates  in  reverse,  taking  the  index  i  as  input  and 
producing  vector  Y,  from  an  identical  codcbook  as 
output  A  block  diagram  of  this  algorithm  is  shown  in 
Eigtne  I  The  conipicssion  ratio  is  calculated  hy 

dividing  die  number  of  bytes  in  each  input  vector  by 
the  number  of  bytes  required  to  represent  (lie  index. 

I  or  image  conipicssion,  vectors  arc  formed  from 
pxp  blocks  of  pixels  taken  from  (lie  image  to  be 

A 

encoded  Each  pxp  block  is  mapped  into  a  p  i  I  veetor 
lor  input  to  die  encoder  and  each  p2xl  decoded  output 
vector  is  reorganized  into  a  pxp  block  in  die  ouiput 
image.  flic  distortion  measure  is  (lie  square  of  the 
Euclidean  distance  between  vectors  X  and  Y 

(P2^i) 

d(X,  Y)  =  X  <M  -  yi>2 
i=() 

where  d  is  the  distortion  and  x,  and  yj  arc  the  elements 
of  vectors  X  and  Y  respectively. 

A  full  search  VQ  with  an  R-bil  index  requires  2*1 
distotliou  ineasuic  calculations  for  each  input  vector 
and  a  codcbook  size  of  2K  vectors.  Each  distortion 
calculation  requires  p2  *  (2  adds  +  l  multiply)  integer 
arithmetic  operations.  Each  vector  stored  in  the 
codcbook  will  occupy  p2  bytes  at  8-bits  per  pixel. 

2.2  binary  Tree-Search  VO  Algorithm 

The  codcbook  of  a  binary  tree-searched  VQ  with 
an  K-bit  index  is  organized  in  R  levels.  Each  level,  Lj, 

contains  2*  entries  grouped  in  pairs,  wlicic  1=1.  2 . K. 

An  input  vector  is  compared  with  the  two  entries  of  the 
lirst  codcbook  level  and  tire  “closest”  codcbook  vector  is 
selected  using  the  minimum  dislmlinn  measure.  The 
resuh  determines  which  pair  of  codcbook  vectors  In  the 
second  level  will  be  compared  to  the  Input  vector.  The 
process  is  repealed  lor  each  level  of  (lie  codcbook  wilh 
two  distortion  measure  calculations  being  made  at  each 
level.  The  binary  decisions  made  during  the  “path” 
through  the  codcbook  "tree”  form  the  output  codeword, 

A  ircc-scnrcltrd  VQ  with  an  R-bit  index  requites 
only  2  *  R  distortion  measure  calculations  lor  each 
inpul  vccior  making  ibis  algorithm  much  more 
aitinctivc  lot  iiiinsmlssiou  applications.  Since  an 
exhimsilvf  seaicli  of  the  codcbook  is  nut  peilonued, 
decoded  Image  quality  Is  slightly  degraded  compared  to 
the  full  search  algorithm.  A  second  disadvantage  of 
ircc-scarcbcd  VQ  is  that  2  *  (2iJ  -  I)  vectors  must  be 
stored  in  tbc  codcbook  nearly  doubling  the  memory 
requirements  nf  full  search  VQ, 


Tilt;  VIliKO  ANALYSIS  TICANM*tf  I  LR  AHR  A  V 

The  VATA  uses  one  I  minis  ?tiMI!/  ISsUti  rranspui,  • 
at  each  array  node  'I  lie  T  K( rO  includes  a  )2-h,i  reinrji 
processing  unit  (CPU),  a  64-bit  floating -point  unit 
(EPIJ).  4  serial  communication  links.  4  Kbytes  of 
on-chip  RAM,  and  an  external  memory  interface  mi  a 
single  clii|t  |7|  'the  t’t'tl  can  .c.liirvc  .1  snsraun'd 

performance  ol  HI  million  iristtu,.  lions  |  r  second 
(MIPS).  Each  serial  link  can  transfer  tt.:  ■  ■  hriwien 
memory  and  another  link  at  2.  IS  MliylC'/'n 
(bidirectional).  Alter  initialization,  data  transfer  >01 
one  or  more  links  can  occur  simultaneously  with  CPU 
and  El’ll  operations.  This  important  feature  allows  the 
overlapping  of  communication  and  processing  with 
very  little  performance  degradation  An  additional 
transputer  device,  the  CtltM  crossbar  switch,  provides 
programmable  conligurution  of  the  array  architecture 

A  block  diagram  of  the  VATA  bardwate  is  shown 
in  Eigurc  2.  Tbc  system  consists  of  a  standard  NTSC 
camera,  an  RGH  display,  and  an  HIM  I’C-AT  hirst  housing 
two  ccmimcreially-availnblc  boards  (the  Eramc  Crabber 
and  the  Transputer  Add-lu  Hoard)  and  two  types  nf 
custom  boards  (the  VATA  Interface  and  sire  VATA 
Processor). 

The  frame  grabber  is  a  Data  Translation  model 
DT2861  Arithmetic  Eramc  Grabber  lor  tbc  liSM  PC-AT 
Uc  frame  grabber  can  acquire  video  frame1,  from  the 
merit,  store,  nnd  display  them  on  the  monitor  at  a 
.imc  rate  of  3t>  framcs/scc.  The  frame  grabber  also 
is  a  high  speed  I/O  port  which  can  transfer  frame 
ua  to  or  front  an  external  device  at  10  Mhytcs/scc. 

Each  VATA  Processor  (VP)  boar.!  contains 
thirty-two  T800  transputers  and  four  0)04  crossbar 
switches.  Due  to  board  size  limitations.  VP  TXOOs  have 
no  external  RAM.  'Ike  VATA  lulcilacr  (VI)  heard 
handles  ccuiuiuuicalion  .between  the  frame  grabber  I/O 

;iort,  tbc  HIM  PC-AT  host,  anti  one  or  more  VP  boards. 
Eramc  data  is  passed  between  the  frame  grabber  I/O 

urn  and  tbc  VI*  boards  while  control  and  status 
messages  arc  passed  between  the  WM  PC-AT  bus  and  the 
VP  boatds.  The  Transputer  Add-In  hoard  (litmus  model 
IMS  11008)  is  used  In  compile,  link,  conliguic,  and  load 

programs  onto  tbc  VI  and  VP  boards  using  tbc  tmuos 

Occam  Toolset  software. 


4.  VATA  SOITWARi:  DEVELOPMENT 

To  map  mi  idgotilltin  onto  the  VATA,  soltwatc 
must  be  developed  to.  the  HIM  PC-AT  host  and  tor  the 
transputers.  The  host  program  Cunltgutcs  the  atray, 
controls  the  frame  grabber  display  and  acquisition 
functions,  nnd  synclooniz.es  data  transfer  between  the 
transputers  and  tbc  Iramc  grabber.  The  HIM  PC-AT 
host  is  programmed  in  C  using  ,he  Microsoft  C  couiptler, 
version  ,V0. 

The  development  if  transputer  programs 
involves  the  selection  ol  an  an  ay  architecture 
optimized  lot  the  algntillmi  and  the  division  ol  ;u< 
nlgotllhm  Into  snh  tasks,  Each  transputer  is  as  sign  e  A  » 
sub-task  ami  each  sub-task  tequirvs  •  dittvf-it 
ptogtam.  Occam,  a  parallel  programming  language 
designed  by  Inmos,  is  curientty  the  optimum  language 
for  programming  transputers  (8  0) 


Occam  is  a  high  level  l.itipu.igi'  tli.il  allows  access 
In  scvcial  of  llic  spec i.il  features  of  the  liauspiilcr  to 
-  U|ilimi/c  performance  |I0|  I  lie  icclimipie  ol  sci|iicnlial 
loop  oplimi/alion  greatly  rciluccs  array  access  limes  liy 
replacing  loops  witli  in-line  code.  liven  though  code 
size  increases,  litis  type  ol  optimization  is  very  efficient 
when  performing  arillmirtic  operations  on  vectors  due 
to  special  features  of  the  transputer  insiiiulinii  set. 
The  optimization  of  type  conversions  is  especially 
applicable  to  image  compression  since  vector  elements 
(pixels)  arc  stored  as  bytes  ami  most  lie  convened  to 
integers  !o  perform  distortion  measure  calculations.  A 
considerable  time  savings  can  he  achieved  hy  storing 
the  codcbook  byte-,  as  integers.  lire  amount  of  memory 
required  to  store  the  codcbook  quadruples  witli  ibis 
scheme  making  i,  impractical  in  eases  where  memory 
is  scarce. 

Maximum  throughput  is  achieved  in  a 
mulli-IMiisptilcr  system  by  keeping  each  CPU  and  all 
links  as  busy  as  possible.  This  is  done  by  lire  technique 
of  overlapping  communication  and  computation.  Once 
a  link  communication  operation  lias  been  initialised, 
data  transfer  can  occur  without  significantly 
degrading  processor  pet  tor  mance .  Use  of  this  method 
triples  the  amount  ol  memory  required  lor  data 
buffering  and  increases  code  size. 

5.  IMI’LKMKN  TATION  OF  VKCTOR  QUANTIZATION 
ALGORITHMS 

The  Vector  Quantization  algorithms  implemented 
in  this  work  have  been  designed  to  achieve  a 
compression  ratio  of  16.  Input  vectors  arc  formed  from 
4xd  blocks  of  pixels  taken  from  the  frame  grabber. 
Codebooks  contain  256  vectors  requiring  8-bit  indices. 

Programs  that  perform  VQ  encoding,  decoding,  and 
codcbook  generation  have  been  implemented  for  both 
the  full  search  and  binary  trcc-scarcli  algorithms. 
Input  images  arc  5 12 x 5 12  frames  of  8-hit  pixels  All  of 
the  programs  described  here  run  with  one  VP  hoard  in 
the  VATA.  Parallelization  is  accomplished  by 
distributing  the  codcbook  among  the  VP  transputers. 

5.1  Full  Search  Vector  Quantization 

The  full  search  encoding  algorithm  is  extremely 
computationally  intensive  requiring  200  million 
integer  arithmetic  operations  nr  encode  a  single  frame. 
Lac  It  integer  arithmetic  operation  typically  requires 
several  instructions  lo  complete  when  memory  I/O  and 

type  conversions  are  included.  Since  lire  thirty-two 
THOOs  on  the  VP  hoard  have  a  total  instruction  execution 
ra’e  of  320  MIPS,  the  compulation  lime  to  encode  a 
single  frame  will  be  several  seconds.  Ilccniling 
requires  only  a  table  look-up  operation  involving  a  Tew 
memory  I/O  instructions  and  takes  much  less  lime  than 
encoding. 

The  VATA  has  been  designed  for  real-time 

processing  so  the  communication  (link  I/O)  time  is 

typically  on  the  order  of  .13  msec.  This  is  much  less 
than  the  compulation  time  of  several  srcomls  expected 
for  full  search  VQ  encoding.  When  ilte  computation 
time  Is  much  greater  than  the  communication  time, 
variations  In  array  architecture  usually  have  little 
cllcel  oil  overall  pet lotm. one  To  vrtily  this  claim,  the 
full  search  algorithm  has  been  mapped  onto  a  linear 
array  and  onto  an  K-column  ati.ty,  Anay  aichitcctnies 


ite  shown  m  Figure  J  VI  TXUtis  arc  used  lor 
transfer  and  reorganization  only. 

Flooding  limes  lor  the  full  scarf  It  algorithm  arc 
summarized  in  Table  I.  The  lest  image  »x  a  tacc  with 
simple  background.  Flfccts  of  sequential  loop 

optimization,  type  conversion  optimization.  ami 
communication  overlap  have  been  measured  for  birth 
anay  arch  i  led  iocs,  The  type  conversion  rrpiiniizalooi 
could  nut  he  tested  in  the  X -column  configuration  due  tt> 
llic  4  Kbyte  memory  limitation  in  the  VI*  I  Xtlfls  l).ua 
liuuslcr  and  rciuganizaliim  time  is  me  a  mi  red  hy 
removing  the  encoding  process  liom  the  overlapped 
code. 

Overlap  of  communication  and  computation 
produces  less  speedup  in  the  8-column  implementation 
than  it  lines  in  the  linear  implementation  because  llic 
speedup  from  this  optimization  is  proportional  lo  the 
number  of  transputers  that  the  data  must  pass  through 
The  difference  between  data  transfer  and 

reorganization  times  for  the  two  array  architectures  is 
mostly  due  lo  the  Tael  that  twice  as  many  transputers 
..re  pcrfoiiiiing  these  tasks  in  the  8-column  case.  Rven 
if  this  cllcel  is  neglected,  the  relative  difference 
lu-tween  the  measured  limes  of  5.1  ami  4.8  seconds  for 
die  overlapped  loop  optimized  eases  is  less  than  6%. 
This  supports  the  claim  that  performance  is  not 
strongly  influenced  hy  array  architecture  when 
computation  time  is  dominant. 

5.2  Hittary  Tree-Search  Vector  Quantization 

Full  search  encoding  requires  2S6  distortion 
measure  calculations  for  each  input  vector  while 
binary  tree-search  encoding  requires  only  8.  This 
indicates  that  binary  trcc-scarcli  encoding  lime  will  he 
approximately  eight  times  Taster  than  full  search 

encoding  time.  As  n  result,  computation  time  will  not  he 
much  larger  than  communication  time  ,  nd  a  multiple 
column  array  should  perform  significai  ly  better  than 
a  linear  array. 

Unfortunately,  the  stricture  i"  the  binary 

tree-search  algorithm  and  the  storage  requirements 
for  a  larger  codcbook  combine  to  make  a  multiple 
column  implementation  impractical  due  to  the  4  Kbyte 
memory  constraint  in  the  VI*  TSUOs.  liven  with  a  linear 
architecture.  the  memory  lintitaion  prevents 
implementation  of  an  optimal  distribution  of  the 
codehook  making  encoding  lime  image  dependent.  As  a 
tcsnll,  encoding  lime  for  this  algorithm  is  expected  to 
he  between  2  and  8  times  faster  than  encoding  time  for 
die  full  search  algorithm.  The  memory  limit  also 

rcvctils  the  use  of  type  conversion  optimizations. 

To  measure  encoding  lime  variations,  timing 
sts  have  been  made  using  ll  ec  input  images.  A  best 
ise  image  has  been  con  slrucled  liom  optimally 
mlercd  vectors  taken  front  the  last  level  of  a  default 
tinaiy  tree  codcbook.  A  second  image  in  which  all 
lixcls  have  been  sel  to  \hc  same  value  is  used  lot  a 
vorst  ease  test.  The  last  image  is  the  test  image  used  for 
nil  scatelt  VQ. 

Hncoding  times  fur  llic  optimal,  typical,  and 
worst  ease  input  images  ate  listed  in  Table  2-  The 
encoding  lime  lor  the  best  ease  image  is  approximately 
•qual  to  the  data  tiaosler  and  reorganization  time 
when  optimization  and  overlap  ate  included  This 

indicates  that  encoding  time  is  no  greater  than  It. 3 

seconds,  about  eight  times  lastet  than  the  linear 

implementation  ol  the  lull  seat  eh  algorithm.  IVit.v 


5. 


translcr  .mil  reorganization  tunc  is  li.it  v  c  <  I  when  all 
l**iii  vi  I'HOOs  arc  n  sc<l  m  •  lie  X-ciduinii  iiiiplcincitl.ilioil 
ol  tlic  lull  search  algornliin.  This  imlic.ucs  that 
pcrlurmancc  would  he  significantly  improved  hy  a, 
multiply  column  archneclim-. 

A  “near  leaf-lime”  d.tmmsnanon  of  the  hjnaiy 
tree  VQ  algortltiiu  has  heen  developed  to  simulate 
videophone  applications  This  piogiatu  uses  the  linear 
an.iy  implementation  ol  the  hiuaty  nee  alponilun  to 
encode  and  decode  l?Ksl2X  |iisel  images  linapes  .lie 
p.ialdied  (ioiii  the  video  canicia.  pmir'snl.  and 
displayed  on  the  inoniioi.  I  his  ptogiam  inns  at  7.5 
framcs/second . 

Comparison  with  ihc  results  in  Table  2  indicate 
that  the  real-time  program  should  run  about  twice  this 
fast.  The  limitation  in  this  ease  is  due  to  the  I/O  poll  of 
the  Iruiuc  grabber.  liven  with  all  processing  lemoved, 
Ihc  maximum  I  runic  rate  is  7.5  liai.ies/second. 
Additional  tests  show  that  Ihc  I2X*I2X  pixel  propram 
would  run  at  about  12  framcs/second  without  frame 
grabber  limitations.  Without  this  limitation,  addition  of 
a  second  VI’  hoard  and  inclusion  of  l tic  unused  VI  TXOOs 
in  a  "two-column''  architecture  would  more  than  double 
the  frame  rate  lor  a  "real-time”  system  Ihc  use  of 
50.Mli/  TXOOs  would  I ti i liter  increase  speed. 
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(,.  CONCLUSION 

Computationally  intensive  vector  i|uanti/.alion 
algorithms  have  been  mapped  onto  the  VATA  and  used 
to  compress  image  data.  Different  array  architectures 
have  been  implemented  and  algorithm  performance 
has  been  compared  for  each  architecture  Array 
architecture  Itas  little  clfcct  on  pet lottnam  c  when 
compulation  lime  is  much  gicalrr  than  communication 
lime.  In  these  cases,  memory  can  tie  very  effectively 
traded  for  performance.  Performance  is  maximized  in 
all  cases  when  sequential  optimizations  arc  combined 
with  the  overlap  of  communication  and  computation. 
Algorithms  such  as  multi-stage  VQ  combine  the  speed  of 
trcc-scarclt  with  the  low  memory  requirements  of  full 
scatch.  Multiple-column  implementations  of  this  type 
of  V<7  algorithm  should  increase  performance 
significantly. 
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figure  3  Examples  of  VAlA  array  architectures 


Table  1.  Encoding  limes  for  full  search  V'U  Table  2.  Encoding  limes  lor  binary  iree  search  VO 
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